Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regular expression parsing #3244

Merged
merged 3 commits into from
Jul 19, 2024
Merged

Conversation

Seldaek
Copy link
Contributor

@Seldaek Seldaek commented Jul 15, 2024

I moved the grammar file into the repo instead of the hoa regex one with patch, that makes it a lot easier to work with.

This PR fixes parsing of complex character classes which was really not well supported in hoa/regex, along with a bunch of other smaller issues.

@Seldaek Seldaek force-pushed the bug11323 branch 3 times, most recently from f3b0770 to 56a35df Compare July 17, 2024 15:05
@@ -372,12 +372,13 @@ private function parseGroups(string $regex): ?array
{
if (self::$parser === null) {
/** @throws void */
self::$parser = Llk::load(new Read('hoa://Library/Regex/Grammar.pp'));
self::$parser = Llk::load(new Read(__DIR__ . '/../../../resources/RegexGrammar.pp'));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ondrejmirtes Are we ok loading the file from here instead? Not sure what this does in terms of licensing, nor if there's a better place to put it. If so I'll delete the patch file and drop the hoa/regex requirement.

Copy link
Contributor

@staabm staabm Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this path will work in the packaged phpstan.phar - while I agree that having the RegexGrammar.pp in the phpstan-src repo might ease the contribution experience - as the patching is really cumbersome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree moving from the patching as the patching is very hard to understand and the changes needed are more and more complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I got rid of it, we'll see what @ondrejmirtes says :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add the whole and unpatched RegexGrammar.pp as first/separate commit, then the patched/actual grammar and then the changes from this PR. This will hugely improve the review and git history.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done 👍🏻

resources/RegexGrammar.pp Outdated Show resolved Hide resolved
@Seldaek Seldaek marked this pull request as ready for review July 17, 2024 21:03
@phpstan-bot
Copy link
Collaborator

This pull request has been marked as ready for review.

@Seldaek
Copy link
Contributor Author

Seldaek commented Jul 18, 2024

@staabm please review the last 45 commits and especially test changes.

Copy link
Contributor

@staabm staabm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really good for me.

only open question (which Ondrej has to decide on): what todo with the RegexGrammar.pp.



// Skip.
%skip nl \n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n can be in pcre class as well

Copy link
Contributor Author

@Seldaek Seldaek Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am unsure why this is here, I haven't experimented much, but it is rare to have actual newlines in regexes as they're usually escaped as \n, while this here matches a real newline.

I'll create a new issue for this to look at it later, together with proper support for the x modifier, because it's very much related. Edit: phpstan/phpstan#11360

staabm and others added 3 commits July 19, 2024 14:01
- Skip regex parsing if the pattern is not valid, but report errors then if parsing fails
- Fix support for internal options to include more options, and fix []] (i.e. class end delimiter as first character is not a literal) not parsing correctly
- Add a couple missing quantifiers in getQuantificationRange
- Handle more literal values and fix escaped ones to be unescaped
@ondrejmirtes ondrejmirtes merged commit fc75deb into phpstan:1.11.x Jul 19, 2024
154 checks passed
@ondrejmirtes
Copy link
Member

I cherry-picked the grammar in the repo directly to 1.11.x, rebased this PR, and merged it. Thanks! :)

@Seldaek Seldaek deleted the bug11323 branch July 19, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants